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Abstract. 

Sepsis is a very fatal disease, causing a lot of causalities all over the world, about 
2, 70,000 die of Sepsis annually, thus early detection of Sepsis disease would be a remedy 
to prevent this disease and it would be a big relief to the family of sepsis patients. 
Different researchers have worked on sepsis disease detection and its prediction but still 
the need to have an improved model for Sepsis detection remains. We compared various 
machine learning algorithms for Sepsis detection and used the dataset publicly available 
for all the researchers at Physionet.org, the dataset contains many empty or Null values, 
we applied backward filling and forward filling techniques, and we calculated missing 
values of MAP using equation (1) which gives more precise results, we divided the 40,336 
files of datasets A and B into 80% training set and 20% testing set. We applied the 
algorithms twice one time using vital signs and clinical values of patients and the second 
time using only vital signs of the patients; using vital signs only the training accuracy of 
KNN, Logistic Regression, Random Forest, MLP, and Decision Trees was 0.992, 0.999, 
0.981, 0.981, and 0.981 respectively, while the testing accuracy of KNN, Logistic 
Regression, Random Forest, MLP, and Decision Trees was 0.987, 0.980, 0.983, 0.981, 
and 0.981 respectively, for Sepsis Label 0, the value of precision for KNN, Random 
Forest, Decision Trees, Logistic Regression, and MLP was 0.99, 0.98, 0.98, 0.98, and 0.98 
respectively, while the value of recall for KNN, Random Forest, Decision Trees, Logistic 
Regression, and MLP was 1.00, 1.00, 1.00, 1.00, and 1.00 respectively; the comparison of 
all the above-mentioned algorithms showed that KNN leads over all the competitors 
regarding the accuracy, precision, and recall. 
Keywords: Sepsis detection; Machine Learning; KNN; MLP; Random Forest; Logistic 
Regression; decision trees; forward filling technique; backward filling technique. 
INTRODUCTION 

Sepsis is initiated by injection of infection into the human body, the infection 
leads to internal organs disorders including the heart, lungs, and kidneys, this results in 
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the ultimate death of the patients, annually about 35% of the Septic patients die, and 24% 
budget of the USA is consumed annually for the purpose of diagnosis and the treatment 
of the Sepsis disease [2]. This disease is caused because of an abnormal response of 
internal body tissues towards infection, the unbalanced response of the body indicates 
lower level of immunity system within the human body [3]. Sepsis lifecycle starts from 
infection entry into the human body, which proceeds towards the lungs and gets them 
infected; after being infected the lungs bring infection into the heart which is spreads 
throughout the whole body via blood vessels, this makes the internal organs disorder, 
which ultimately causes the death of infected person [3]. Figure 1 shows the lifecycle of 
Sepsis disease. 

The electronic health record is used for training and testing the proposed model. 
Delahanty et al. [3] used the HER (Electronic Health Record) which comprised of one 
of seven departments of the Cerner Millennium. Administrative credentials were 
achieved through the usual course of hospital’s sops. The data were stored at Tenets’ 
warehouse and then achieved using SQL of the database. They used gradient boosting 
for the detection of Sepsis. Most of the researches about Sepsis has focused on specific 
patient conditions and each has used a different Sepsis definition. Calvert et al. [4] 
proposed an Insight model for early detection of Sepsis (EDOS) using S.LR.S. 
(systematic inflammatory response syndrome) criteria. Reyna et al. [2] used the dataset 
available at physionet [1]; they used k-means clustering for the training of algorithm and 
decision tree for the recommendation of the presence of Sepsis in the patients. They used 
k-means clustering as it represents non-linearly spreader values to be more robust and 


aligned [5]. 
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Figure 1. Sepsis Life Cycle. 
We applied multiple machine learning algorithms on a dataset which is publicly 
available at most visited platform of Physionet.org [1], the dataset contains clinical values 
as well as vital signs of Sepsis disease, we applied the algorithms in two different ways, 
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one using vital signs only and second time using vital signs and clinical values. We 
compared the algorithms based on accuracy, precision, recall, and f1-measure. 

Sepsis detection and prediction became easy due to publicly available datasets, 
which ate easy to manipulate and get the required results by getting training and testing 
of the algorithms to get improved models [4], most of the recent research on Sepsis 
detection using clinical values has been entertaining the publicly available dataset 
provided by Physionet.org [1], the Physionet platform provides as much information and 
guidance for using and applying the dataset, they give information how to access the 
dataset, and show the tabular form of the dataset and define all the attributes present in 
the dataset. They also tell the limitations of the dataset, 1e., there are empty values in the 
dataset denoted by NAN. Many clinical values of the patients were missing from the 
dataset i.e., no hourly update between of clinical values was available throughout the 
dataset except a few of them are present frequent for some entries of the patients [6]. 
Shankar-Hari et al. [6] applied the forward filling technique to fill in the missing values 
of the column, the forward filling technique. 

Forward Filling 

The forward filling is a simple data filling technique used for filling the missing 
values in a csv or excel file; the missing value is replaced by the previous value of that 
index. The current value of a certain place remains the same if the previous values are 
not found [2]. 

In a comparison of machine learning approaches, Hsu et al. [7] evaluated 
different machine learning approaches including Naive Bayes, LLSE (Linear Least 
Square), Hidden Markov Model, Support Vector Machine, LS-Gradient, and Random 
Forests. They used LLSE as a baseline and experimented with Online Lasso and SQR as 
online learners [8]. Bilal, et al. [9] applied machine learning algorithm for detection of 
initial and severe conditions of Sepsis conditions. They utilized electronic health records 
of the patients, for the training and the testing of model, they selected the patient’s record 
with initial eight vital signs of Sepsis for the Sepsis prediction. The vital signs they used 
include Temp, H.R, S.B.P, and M.A.P, they applied the S.LR.S. criteria to define Sepsis 
disease, they applied deep learning algorithm for the classification, and after the 
prediction, the comparison of Adaptive CNN, RNN-LSTM, and SVM-quadratic kernel, 
the training accuracy for SVM-quadratic kernel, RNN-LSTM, and Adaptive CNN were 
78.00%, 92.72%, and 93.84% respectively. 

The testing accuracy of the accuracy of SVM- quadratic kernel, RNN-LSTM, and 
Adaptive CNN were 68.00%, 91.10%, and 93.18% respectively. Jia, et al. [10] utilized the 
dataset publicly available at Physionet [1], they applied three machine learning algorithms 
named random forest, decision tree, and logistic regression, they applied multiple 
autoencoders and compared their accuracy values with different autoencoders, they 
applied T.A.E(Temporal Autoencoder), S.A.E(Spatial Autoencoder), S.T.A.E(Spatial- 
Temporal Autoencoder), and T.S.A.E(Temporal and Spatial Autoencoder), the random 
forest gave maximum accuracy of 72.2% using T.S.A.E., the decision trees gave 
maximum accuracy of 67.9% using Temporal Autoencoder, and the logistic regression 
gave the maximum of 60.4% accuracy using T.A.E. 

Waseem et al. [13], applied LMA based ANN on big data for air conditioners 
controlling, they selected calibration network architecture and no. of neurons were 
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selected according to the requirements, they divided the dataset into 75% training and 
25% testing, the evaluation was made based on Mean Square Error, mean absolute error, 
and Mean Absolute Percentage Error. Zhenzi et al., [14] worked on power system 
planning and prediction, they used case studies of various countries for differentiating 
between maximum and minimum temperature in certain days. 

The main contributions of our work are as under: 

e We proposed a novel set of integrated features, which give a better approach 

towards Sepsis disease detection using a machine learning approach. 

e The proposed system is capable of successfully classifying the Sepsis and Non- 

Sepsis cases for early detection of Sepsis. 

e Our method is capable to detect the Sepsis infection with high accuracy, 

precision, recall, and f1-score. 

e We compared our system with our state-of-art models and achieved best 

performance of our proposed model. 

The remaining paper is organized as, in section 2, we discussed the proposed 
methodology followed by the data pre-processing, filling in missing values, feature 
extraction and classification. Section 3 presents the detail of experimental results and 
discussion. Finally, we conclude our work in section 4. 

PROPOSED METHODOLOGY 

This section presents a detailed description of the proposed sepsis detection 
system. The main objective of the proposed framework is to differentiate between sepsis 
and non-sepsis cases of input data. The proposed system comprises of three stages such 
as data pre-processing, features extraction and classification. The input dataset was 
downloaded from Physionet [1], the input dataset contained two sets namely A and B, 
both sets comprised of psv files containing hourly records, Set A contained more record 
of than 20,000 patients and Set B contained hourly record of about 20,000 patients. In 
the initial stage, the dataset contained about 31% empty values, the empty values were 
filled in by applying two most effective data filling approaches of forward filling and 
backward filling, after that the dataset still contained many empty values of MAP and 
DBP, there were empty values for M.A.P and in some cases M.A.P values were present 
but D.B.P values were missing in the files. We applied the standard formula of D.B.P 
and M.A.P. Equation 1 shows the formulae that gives the minor difference between the 
calculated and the measured values of M.A.P. [11]. 

MAP = DBP+1/3(.BP—-D.BP) (1) 

After filling and preprocessing of the data, we extracted those patients whose 
clinical values were available for frequent hours and we applied the filter of patients with 
more than one observation for each clinical value. Moreover, we applied the filter of 
patients with prediction between 3 to 24 hours and at the end, we sent the patients list to 
classifier for the training. Then we appended the 20,336 CSV files of set A and the first 
12000 files of set B in the training.csv file and 8000 files in testing.csv file. These training 
and testing files contained the vital signs and clinical values of the patients, then only vital 
signs of both training.csv and testing.csv file were selected and then we saved the selected 
data to another CSV files naming vital_training.csv and vital_testing.csv. We applied 
classifiers in two ways; first time on training and testing set with vital signs and clinical 
values and the second time on training and testing set with only vital signs. We applied 
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Random Forest, KNN, Decision Trees, MLP, and Logistic Regression and compared 
their results. Figure 2 shows the methodology for Sepsis Detection. The detailed working 
mechanism of the proposed system is shown in Figure 2. 


Training Set A Training Set B 
20,336 patients 20,000 patients 


Forward Filling 
Backward Filling 


MAP, DBP Filling 


J 


Training Set Testing Set 
32,336 patients 8,000 Patients 


Vital Signs only Vital Signs and 
clinical values 
Classification | 


Figure 2. Proposed System. 

Backward filling 

Backward filling technique is a simple and effective filling technique which is 
required when initial row is empty and the proceeding rows have valid values, and empty 
row cannot be filled in using forward filling technique [2] 
Machine Learning Algorithms 

K.N.N. k-nearest neighbors [22] supervised learning algorithm, used as a solution 
to resolve both the regression and classification problems [12]. The Random forest [22] 
a flexible machine learning algorithm, it produces results even without tuning of hyper- 
parameters of any algorithm [15]. A decision tree [22] a supervised learning technique 
used mostly for the classification problems. It has a pre-defined target variable i.e., actual 
values ate present in the testing data and the predicted values are compared with them 
[16]. The Logistic Regression [22-26] is a well suitable algorithm used to analyze the 
regression of a model; it is applied when the target variable is in a binary form ie., 
positive, and negative [17][24]. 
Neural Networks 

MLP (Multi-Layer Perceptron) is a deep learning algorithm, it works on the 
principle of feeding forward, it takes a set of input values and generates a set of output 
values. [20]. 
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Tunning Parameters 

The algorithms were tuned at suitable input parameters where it gains maximum 
values for each evaluation parameter, logistic regression was tuned at 
max_iteration=1000, Random Forest was tuned at no. of estimators =100, Decision 
Trees algorithm was tuned at max_depth =5 layers, and MLP was tuned at alpha = 1 and 
max_iteration =1000, while KNN was tuned at depth = 5 layers. These parameters gave 
the results for all the evaluation parameters, and among them the KNN declared its 
highest position. 

The algorithms were tuned at suitable input parameters where it gains maximum 
values for each evaluation parameter, logistic regression was tuned at max_iteration 
=1000, Random Forest was tuned at no. of estimators =100, Decision Trees algorithm 
was tuned at max_depth =5 layers, and MLP was tuned at alpha = 1 and max_iteration 
=1000, while KNN was tuned at depth = 5 layers. These parameters gave the results for 
all the evaluation parameters, and among them the KNN declared its highest position. 
EXPERIMENTAL RESULTS AND DISCUSSION 
Dataset 

We used the input dataset downloaded from Physionet [1], the input dataset 
contained two sets namely A and B, both sets comprised of psv files containing hourly 
records, Set A contained 20,336 patients and Set B contained hourly record of 20,000 
patients. The dataset contains about 31% empty values denoted by N.A.N which means 
not a number, these empty values depict that the corresponding values were not 
examined at the time of organization of the dataset. The dataset contains eight initial 
columns with the label of vital signs for Sepsis disease, and succeeding 26 columns 
contains clinical values of the patients, the last 5 columns contain demographic values of 
the patient including age, gender, ICU entry time, etc. 

Initially, the dataset contained empty values and after applying forward filling, 
and backward filling techniques mostly the empty values were filled in. After applying 
filling in techniques there was still a need for further pre- processing to make the dataset 
well enough for being used to detect Sepsis disease more accurately and precisely, so we 
applied the standard equation for calculating the value of M.A.P. 

After filling in the missing values of the dataset, we proceeded towards training 
and testing set distribution, we divided the set A, and the set B in the ratio of 80% training 
set and 20% testing set. We selected all the 20 336 csv files of set A, and 12 000 files of 
set B in the training set and remaining 8 000 csv files of set B in the testing set, in this 
way, a standard partition for training and testing sets is organized so the algorithms can 
be trained to the fruitful limit and tested on the suitable testing data, we then selected 
vital signs and saved the training and testing sets in two separate CSV files named 
train_patient_vitals.csv and test_patient_vitals.csv respectively. Then five machine 
learning algorithms were applied, firstly we applied the algorithms using only vital signs 
and the accuracy of all the five machine learning algorithms. 

Evaluation Metrics 

To evaluate the performance of the proposed system, we used an accuracy, 
precision, recall, and Fl-score. This indicates better classification performance of the 
systems to detect Sepsis disease. We compared the performance of our method with 
baseline methods and other existing systems based on accuracy, precision, recall, and f1- 
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score. The training accuracy values for MLP, K-Nearest Neighbours, Decision Trees, 

Random Forest, and Logistic Regression were 0.981, 0.992, 0.981, 0.999, and 0.981 

respectively, and the testing accuracy values for MLP, K-Nearest Neighbours, Random 

Forest, Decision Trees, and Logistic Regression were 0.981, 0.987, 0.983, 0.981, and 

0.980 respectively. Table 1 shows the training and testing accuracy values for K.N.N., 

Random Forest, Logistic Regression, Decision Trees, and MLP using vital signs only. 
Table 1. Accuracy using vital signs only. 


Model Name Training Testing 
Accuracy Accuracy 
K.N.N. 0.992 0.987 
Random Forest 0.999 0.983 
Decision Trees 0.981 0. 981 
Logistic Regression 0. 981 0.980 
MLP 0. 981 0.981 


Precision and Recall give the performance evaluation of the models for Sepsis 
detection, equation 2 shows the formula for calculating precision, while equation 3 shows 
the formula for calculating recall, we calculated precision and recall using vital signs only 
for SepsisLabel 1 and 0, and their average was calculated. For SepsisLabel 0 the precision 
of KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 0.99, 0.98, 
0.98, 0.98, and 0.98 respectively, and for Sepsis Label 1 the precision of KNN, Random 
Forest, Logistic Regression, MLP, and Decision Tree was 0.77, 0.95, 0.20, 0.00, and 0.00 
respectively, while the average precision for KNN, Random Forest, Logistic Regression, 
MLP, and Decision Tree was 0.95, 0.90, 0.50, 0.50, and 0.50 respectively. For Sepsis Label 
0 the recall of KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 
1.00, 1.00, 1.00, 1.00, and 1.00 respectively, and for Sepsis Label I the recall of KNN, 
Random Forest, Logistic Regression, MLP, and Decision Tree was 0.50, 0.14, 0.00, 0.00, 
and 0.00 respectively while the average precision for KNN, Random Forest, Logistic 
Regression, MLP, and Decision Tree was 0.75, 0.57, 0.50, 0.50, and 0.50 respectively. 
Table 2 shows the results of precision and recall for the machine learning models using 
Sepsis Label 0,1 and average. 


P = TLE GP EP) (2) 
R a TLE. GPE). © 
Table 2. Precision-Recall using vital signs only. 
Model Name Sepsis Label Precision Recall 
0.99 1.00 
K.N.N. 1 0.99 0.90 
Avg. 0.99 0.95 
0 0.99 1.00 
Random Forest 1 0.96 0.80 
Avg. 0.97 0.90 
0 0.98 1.00 
Decision Trees 1 0.00 0.00 
Avg. 0.49 0.50 
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0 0. 98 1.00 

Logistic Regression 1 0.15 0.00 
Avg. 0.56 0.50 

0 0. 98 1.00 

MLP 1 0.00 0.00 

Avg. 0.49 0.50 


F1-Score is applied for handling the issues of inverse relation of the precision 
and the recall; the precision decreases with the increase in the value of recall [19]. F1- 
score was calculated for K-Nearest Neighbours, Decision Trees, Logistic Regression, 
Random Forest, and MLP. For SepsisLabel 0 using vital signs only the values of f1-score 
for K-Nearest Neighbours, Decision Trees, Logistic Regression, Random Forest, and 
MLP were 0.99, 0.99, 0.99, 0.99, and 0.99 respectively. For Sepsis Label 1, the values of 
f1-score for K.N.N, Decision Trees, Logistic Regression, Random Forest, and MLP were 
0.61, 0.00, 0.01, 0.24, and 0.00 respectively, while the average fl-score for KNN, Random 
Forest, Logistic Regression, MLP, and Decision Tree was 0.80, 0.49, 0.50, 0.61, and 0.49 
respectively. Table 3 traverse the results of the F1-score for the machine learning models 
using vital signs only. 

Table 3. F1-Score using vital signs only. 


Model Name Sepsis Label F1Score 
0.99 
K.N.N. 1 0.99 
Avg. 0.99 
0 0.99 
Random Forest 1 0.96 
Avg. 0.97 
0 0.98 
Decision Trees 1 0.00 
Avg. 0.49 
0 0. 98 
Logistic Regression 1 0.15 
Avg. 0.56 
0 0. 98 
MLP 1 0.00 
Avg. 0.49 


In the second phase, all the five algorithms were applied on the dataset, 
containing both the vital signs and the clinical values for patients with all the features of 
the dataset, we trained the algorithms using the training set and after training they were 
analyzed on the basis of their accuracy, precision, recall, and fl-score, Table 4 shows the 
accuracy of all the above-mentioned algorithms, the training accuracy values for K- 
Nearest Neighbours, MLP, Random Forest, Decision Trees, and Logistic Regression 
were 1.000, 0.997, 0.981, 0.981, and 0.981 respectively, and the testing accuracy values 
for MLP, K-Nearest Neighbours, Random Forest, Decision Trees, and Logistic 
Regression were 0.995, 0.993, 0.981, 0.980, and 0.981 respectively. 
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The precision values using vital signs and clinical values of patients for MLP, RF, 
Decision Trees, Logistic Regression, and KNN, for Sepsis Label 0 were 0.99, 0.99, 0.98, 
0.98, and 0.98 respectively, and for Sepsis Label 1 the precision of KNN, Random Forest, 
Logistic Regression, MLP, and Decision Tree was 0.96, 0.95, 0.20, 0.00, and 0.00 
respectively while the average precision for KNN, Random Forest, Logistic Regression, 
MLP, and Decision Tree was 0.97, 0.97, 0.59, 0.49, and 0.49 respectively. For Sepsis Label 
0 the recall of KNN, Random Forest, Logistic Regression, MLP, and Decision Tree was 
1.00, 1.00, 1.00, 1.00, and 1.00 respectively, and for Sepsis Label 1 the recall of KNN, 
Random Forest, Logistic Regression, MLP, and Decision Tree was 0.50, 0.14, 0.00, 0.00, 
and 0.00 respectively while the average recall for KNN, Random Forest, Logistic 
Regression, MLP, and Decision Tree was 0. Table 5 shows the results of precision and 
recall for the machine learning models using vital signs and clinical values. 

F1 score was calculated for K.N.N. (K-Nearest Neighbours), Decision Trees, 
Logistic Regression, Random Forest, and MLP, for sepsis label 0 suing vital signs and 
clinical values of the patients, the values of f1-score for K.N.N. (K-Nearest Neighbours), 
Decision Trees, Logistic Regression, Random Forest, and MLP were 0.99, 0.99, 0.99, 
0.99, and 0.99 respectively, while for Sepsis Label 1, the values of f1- score for K.N.N. 
(K-Nearest Neighbours, Decision Trees, Logistic Regression, Random Forest, and MLP 
were 0.61, 0.00, 0.01, 0.24, and 0.00 respectively, while the average fl-score for KNN, 
Random Forest, Logistic Regression, MLP, and Decision Tree was 0.80, 0.49, 0.50, 0.63, 
and 0.49 respectively. Table 6 shows the fl-score of machine learning models using vital 
signs and clinical values. 

Table 4. Accuracy using vital signs and clinical values. 


Model Name Training Testing 
Accuracy Accuracy 
K.N.N. 0.997 0.995 
Random Forest 1.000 0.993 
Decision Trees 0.981 0. 981 
Logistic Regression 0. 981 0.980 
MLP 0. 981 0.981 


Table 5. Precision Recall using vital signs only. 


Model Name Sepsis Label Precision Recall 
0 0.99 1.00 
K.N.N. 1 0.96 0.90 
Avg. 0.975 0.95 
0 0.99 1.00 
Random Forest 1 0.95 0.80 
Avg. 0.97 0.90 
0 0.98 1.00 
Decision Trees 1 0.20 0.00 
Avg. 0.59 0.50 
0. 98 1.00 
Logistic Regression 1 0.00 0.00 
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Avg. 0.49 0.50 
0 0. 98 1.00 
MLP 1 0.00 0.00 
Ave. 0.49 0.50 
Table 6. F1-Score using vital signs only. 
Model Name Sepsis Label F1Score 
0 0.99 
K.N.N. 1 0.61 
Avg. 0.80 
0 0.99 
Random Forest 1 0.00 
Avg. 0.49 
0 0.99 
Decision Trees 1 0.01 
Avg. 0.50 
0 0.24 
Logistic Regression 1 0.99 
Avg. 0.63 
0 0.99 
MLP 1 0.00 
Avg. 0.49 


Comparison with other methods 


We compared the results of our proposed model with the results of Pa Yo et al., [7], 
table 7 shows the accuracy values of KNN, Naive Bayes, SQR, SVM, CNN-LSTM + 
transfer, and CNN-LSTM. The accuracy values of KNN, Naive Bayes, SQR, SVM, 
CNN-LSTM + transfer, and CNN- LSTM are 99.1%, 84%, 60%, 90%, 90%, and 95% 
respectively. The comparison shows that KNN shows the highest accuracy value among 
these competitors. Jaccob et al., [21] calculated the accuracy of 90.9 % for their proposed 
algorithm insight, table 8 shows the results of comparison of proposed model with the 
results of Jaccob et al., [21]. 


Table 7. Comparison of Accuracy of proposed system with other systems. 


Model Name Accuracy 
K.N.N. [7] 99.1% 
Naive Bayes [7] 84% 
SQR [7] 60% 
SVM [7] 90% 
CNN-LSTM + transfer [7] 90% 
CNN-LSTM [7] 95% 


Feb 2022 | Vol 4| Issue 1 Page | 184 


OPEN ACCESS . . : < 
6} International Journal of Innovations in Science & Technology 


Table 8. Comparison of Accuracy of proposed system with previous work. 


Model Name Accuracy 
KN.N. [21] 99.1% 
Insight [21] 90.9% 

SAPS (IL) [21] 50% 
SIRS [21] 20% 
K.N.N_[21] 99 6 


The dataset contained many empty values, and the dataset was not able to be 
used for training of an algorithm to give improved results, the data pre-processing 
techniques including backward filling, forward filling technique, and MAP calculation 
formula gave a smartly filled up dataset which is ready to be used. Results of Table 1 and 
4 showed that the accuracy of all the algorithms increases when the input dataset contains 
vital signs as well as all clinical values, KNN shows the best results among all its peers in 
both scenarios. All the above results showed that KNN leads all the competitors 
regarding all aspects including accuracy, precision, recall, and F1-score. 
CONCLUSION 

This paper has presented a novel and reliable sepsis detection framework based 
on a machine learning algorithm. We evaluated the performance by employing numerous 
machine learning algorithms such as MLP, Decision Trees, Logistic Regression, and 
Random Forest, KNN and compared their results. The experimental results 
demonstrated that KNN among all the machine learning classifiers outperformed in 
terms of accuracy, precision, recall, and Fl-score. In the future, we would extend this 
work to implementation in the real life by facilitating patients to check whether the Sepsis 
infection is present or not, with the help of Android Applications as well as Web 
Applications. 
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